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Abstract 

We develop a method for sparse image reconstruction from polychromatic computed tomography (CT) mea¬ 
surements under the blind scenario where the material of the inspected object and the incident-energy spectrum are 
unknown. We obtain a parsimonious measurement-model parameterization by changing the integral variable from 
photon energy to mass attenuation, which allows us to combine the variations brought by the unknown incident 
spectrum and mass attenuation into a single unknown mass-attenuation spectrum function; the resulting measurement 
equation has the Laplace integral form. The mass-attenuation spectrum is then expanded into basis functions using 
B-splines of order one. We derive a block coordinate-descent algorithm for constrained minimization of a penalized 
negative log-likelihood (NLL) cost function, where penalty terms ensure nonnegativity of the spline coefficients and 
nonnegativity and sparsity of the density map image. The image sparsity is imposed using convex total-variation 
(TV) and £i norms, applied to the density-map image and its discrete wavelet transform (DWT) coefficients, 
respectively. This algorithm alternates between Nesterov’s proximal-gradient (NPG) and limited-memory Broyden- 
Fletcher-Goldfarb-Shanno with box constraints (L-BFGS-B) steps for updating the image and mass-attenuation 
spectmm parameters. To accelerate convergence of the density-map NPG step, we apply a step-size selection scheme 
that accounts for varying local Lipschitz constant of the NLL. We consider lognormal and Poisson noise models and 
establish conditions for biconvexity of the corresponding NLLs with respect to the density map and mass-attenuation 
spectmm parameters. We also prove the Kurdyka-Lojasiewicz property of the objective function, which is important 
for establishing local convergence of block-coordinate schemes in biconvex optimization problems. Numerical 
experiments with simulated and real X-ray CT data demonstrate the performance of the proposed scheme. 

I. Introduction 

X-ray computed tomography (CT) measurement systems are important in modern nondestructive eval¬ 
uation (NDE) and medical diagnostics. The past decades have seen great progress in CT hardware and 
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(reconstruction) software development. CT sees into the inspected object and gives 2D and 3D reconstruction 
at a high resolution. It is a fast high-resolution method that can distinguish a density difference as small 
as 1 % between materials. As it shows the finest detail of the inside, it has been one of the most important 
techniques in medical diagnosis, material analysis and characterization, and NDE [1-3]. 

Therefore, improving reconstruction accuracy and speed of data collection in these systems could have a 
significant impact on these broad areas. Thanks to recent computational and theoretical advances, such as 
graphics processing units (GPUs) and sparse signal reconstruction theory and methods, it is now possible 
to design iterative reconstruction methods that incorporate accurate nonlinear physical models into sparse 
signal reconstructions from significantly undersampled measurements. 

However, due to the polychromatic nature of the X-ray source, linear reconstructions such as filtered 
backprojection (FBP) exhibit beam-hardening artifacts, e.g., cupping and streaking [4]. These artifacts 
limit the quantitative analysis of the reconstruction. In medical CT application, severe artifacts can even 
look similar to certain pathologies and further mislead the diagnosis [5, Sec. 7.6.2]. Fulfilling the promise 
of compressed sensing and sparse signal reconstruction in X-ray CT depends on accounting for the 
polychromatic measurements as well. It is not clear how aliasing and beam-hardening artifacts interact and 
our experience is that we cannot achieve great undersampling when applying sparse linear reconstruction 
to polychromatic measurements. Indeed, the error caused by the model mismatch may well be larger than 
the aliasing error that we wish to correct via sparse signal reconstruction. 

In this paper (see also [6, 7]), we adopt the nonlinear measurement scenario resulting from the 
polychromatic X-ray source and simplify the measurement model by exploiting the relation between 
the mass attenuation coefficients. X-ray photon energy and incident spectrum, see Fig. la. This simplified 
model allows blind density-map reconstruction and estimation of the composite mass attenuation spectrum 
L(fi;) (depicted in Fig. la) in the case where both the mass attenuation and incident spectrum are unknown. 

We introduce the notation: /at, Iatxi, and Oatxi are the identity matrix of size N and the iV x 1 vectors 
of ones and zeros (replaced by /, 1, and 0 when the dimensions can be inferred easily); e„ = /(:,n) is a 
column vector with the nth element equal to one and the remaining elements equal to zero; l-j, H-Hp, and 
are the absolute value, ^p norm, and transpose, respectively. Denote by \x] the smallest integer larger 
than or equal to a; G M. For a vector ck = [ai,..., E define the nonnegativity indicator function 
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Figure 1: (a) Mass-attenuation spectrum L(fi:) obtained by combining the mass attenuation K{e) and incident 
spectrum L{e), and (b) its B1-spline expansion. 


and projector 


I[0,+oo) (q:) 


0 , 


a ^ 0 


-l-cxD, Otherwise 
[(a)+]. = max(ai,0) 


(la) 

(lb) 


and the soft-thresholding operator [T\{ol)~\. = sign (a j) max (|aj| — A,0), where is the elementwise 
version of Furthermore, a^(s) = J a{K)e~^^ dn is the Laplace transform of a vector function a(K) 
and 

((-K)'"a)^ (s) = j (-K)™a(K)e“^'" dn = m e Nq. (Ic) 

Define also the set of nonnegative real numbers as M+ = [0, -foo), the elementwise logarithm lno(a;) = 
[Inxi,..., In nonnegativity projector [(»;)+]* = max{a:i,0} where x = [xi,X2, ■ ■ ■, xat ]^, and Laplace 
transforms ao(s) = (<3'^('Sn))^^^ and (Ka)^(s) = ((«^a')^(sn))^^j^ obtained by stacking a^(s„) and 
{Ka)^{sn) columnwise, where s = [si, S2,..., Finally, supp(t(-)) is the support set of a function t(-), 
dom(/) = {a; G I f{x) < -1-cxd} is the domain of function /(■), and diag(a;) is the diagonal matrix 
with diagonal elements defined by the corresponding elements of vector x. 


In the following, we review the standard noiseless polychromatic X-ray CT measurement model 
(Section I-A), existing signal reconstruction approaches from polychromatic measurements (Section I-B), 
and our mass-attenuation parameterization of this model (Section I-C). 
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A. Polychromatic X-ray CT Model 


By the exponential law of absorption [8], the fraetion dX/X of plane wave intensity lost in traversing 
an infinitesimal thiekness df at Cartesian eoordinates (x, y) is proportional to df: 

_ =y, e) df (2) 

where p{x,y,e) is the attenuation. To obtain the intensity decrease along a straight-line path ^ = i{x,y) 
at photon energy e, integrate (2) along i: = X‘"(e) exp p{x, y, e) di\, where X°“‘(e) and X“(e) 

are the emergent and incident X-ray signal energies, respectively, at photon energy e. 


To describe the polychromatic X-ray source, assume that its incident intensity X‘" spreads along photon 
energy e following the density i{e) > 0, i.e.. 


J L{e) de = X“. 


(3a) 


see Fig. la, which shows a typical density t(e). The noiseless measurement collected by an energy- 
integrating detector upon traversing a straight line i = i{x,y) has the superposition-integral form: 


= j exp 

= [ L{e) exp 


- / p,{x,y,e) di 


de 


—n{e) / a{x, y) d^ 


de 


(3b) 


where we model the attenuation p,{x, y, e) of the inspected object consisting of a single material using the 
following separable form [9, Sec. 6]: 


p,{x,y,e) = K{e)a{x,y). (4) 

Here, K,{e) > 0 is the mass-attenuation coefficient of the material, a function of the photon energy e 
(illustrated in Fig. la) and a{x,y) > 0 is the density map of the object. For a monochromatic source at 
photon energy e, ln[X‘"(e)/X™‘(e)] is a linear function of a{x,y), which is a basis for traditional linear 
reconstruction. However, X-rays generated by vacuum tubes are not monochromatic [5, 10] and we cannot 
transform the underlying noiseless measurements to a linear model unless we know perfectly the incident 
energy spectrum t(e) and mass attenuation of the inspected material K,{e). 
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B. Existing approaches for signal reconstruction from polychromatic X-ray CT measurements 

Beam-hardening correction methods can be categorized into pre-filtering, linearization, dual-energy, and 
post-reconstruction approaches [11]. Reconstruction methods have recently been developed in [12-14] that 
aim to optimize nonlinear objective functions based on the underlying physical model; [12, 13] assume 
known incident polychromatic source spectrum and imaged materials, whereas [14] considers a blind 
scenario with unknown incident spectrum and imaged materials, but employs an excessive number of 
parameters and suffers from numerical instability. See also the discussion below and in [15]. 

1) Linearization: Observe that (3b) is a decreasing function of the line integral J^a{x,y) di, which 
is a key insight behind linearization methods and the water correction approach in [16], popular in the 
literature and practice, see [9, Sec. 6]. The linearization function has been approximated by exponential 
or polynomial functions [17, 18]. The linearization functions are obtained by a variety of approaches: 
[16, 17] assume known spectrum and the materials to simulate the relation between the polychromatic 
and monochromatic projections; [18] aims to maximize the self-consistency of the projection data to get 
the correction curve blindly; [11] segments the initial reconstruction to obtain propagation path length of 
X-ray and then determines the linearization curve. 

Indeed, we can derive analytically [if K{e) and r(e) are known] or estimate this monotonic nonlinear 
function and apply the corresponding inverse function to the noisy measurements. However, such a zero¬ 
forcing approach to reconstruction ignores noise and therefore leads to noise enhancement. Due to the 
difficulty in getting precise linearization function, it has also been reported to introduce a dependency 
of dimensional measurements on the amount of surrounding material, which is a serious drawback in 
applications such as CT metrology [1]. The scanning configuration, e.g.. X-ray power and prefiltrations, 
makes choosing accurate linearization function harder. The mass-attenuation parameterization in Fig. la 
will allow for improved selection and estimation of the linearization function compared with the existing 
work. Unlike the traditional linearization, the methods that we aim to develop will estimate the linearization 
function and account for the noise effects. The zero-forcing linearization schemes could be used to initialize 
our iterative procedures. 

2) Post-reconstruction: The post-reconstruction approach first proposed in [19], uses the segmentation 
of an initial reconstruction to estimate the traversal length of each X-ray through each material. It assumes 
known incident X-ray spectrum and attenuation coefficients, as well as that the imaged materials are 
homogeneous and have constant density, leading to piecewise-constant reconstructed images. 



6 


3) Poisson and lognormal statistical measurement models: X-ray photons arriving at a detector can 
be well modeled using a Poisson process [13, 20], where the energy deposited by each photon follows 
a distribution that is proportional to the incident energy spectrum. Hence, the measurements collected 
by photon-counting and energy-integrating detectors are modeled using Poisson and compound-Poisson 
distributions, respectively [21, 22]. However, the compound-Poisson distribution is complex and does 
not have a closed-form expression, which motivates introduction of approximations such as Poisson [12] 
and lognormal [7, 14, 23]. Regardless of the statistical measurement models, most most methods in this 
category assume known X-ray spectrum and materials (i.e., known mass-attenuation function), with goal 
to maximize the underlying likelihood function or its regularized version [12, 24]. However, the X-ray 
spectrum measurements based on the semiconductor detectors are usually distorted by charge trapping, 
escape events, and other effects [25] and the corresponding correction requires highly collimated beam 
and special procedures [26, 27]. Knowing the mass-attenuation function can be challenging as well when 
the inspected material is unknown, or the inspected object is made of compound or mixture with unknown 
percentage of each constituent. 

Van Compel et al. [14] consider a “blind” scenario for lognormal measurement model with unknown 
incident spectrum and materials and employ the -means clustering method to initially associate pixels 
to the materials and then alternate between material segmentation and updating the relative density map, 
incident X-ray spectrum, and mass attenuation coefficients for each material. The methods in [14] are 
based on the standard photon-energy parameterization in (3), employ an excessive number of parameters, 
and suffer from numerical instability [15]. Indeed, alternating iteratively between updating excessive 
numbers of non-identifiable parameters prevents development of elegant algorithms and achieving robust 
reconstructions. 

4) Dual-energy: In the dual-energy approach by Alvarez and Macovski [28], two X-ray scans are 
performed using two different incident X-ray spectra. This approach is particularly attractive because it does 
not require image segmentation to separate out different materials and it provides two reconstruction images. 
Depending on the basis functions chosen for modeling the attenuation coefficients, the two reconstructed 
images depict two coefficient maps representing photoelectric and Compton scattering interaction [28] or 
two density maps of the chosen basis materials [29]. The dual-energy approach doubles the scanning time, 
needs switching scanning energy, and requires the exact knowledge of the spectra. 
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C. Mass-attenuation parameterization 

We now introduce our parsimonious parameterization of (3b) for signal reconstruction. Since the mass 
attenuation K{e) and incident spectrum density L{e) are both functions of e (see Fig. la), we combine 
the variations of these two functions and write (3a) and (3b) as integrals of k rather than e, with goal to 
represent our model using two functions i(k) (defined below) and a{x,y) instead of three [t(e), K{e), and 
a{x,y)], see also [ 6 ]. Hence, we rewrite (3a) and (3b) as (see Appendix A) 

r" = L^(0) (5a) 

1°'^^ = (^j^a{x,y)di^ (5b) 

where l^(s) = / i{K)e~^'^dK is the Laplace transform of the mass-attenuation spectrum l(/t;), which 
represents the density of the incident X-ray energy at attenuation k; here, s > 0, in contrast with the 
traditional Laplace transform where s is generally complex. For invertible K{e) with differentiable inverse 
function e{K), i(fi;) = t(e(fi;))|e'(fi;)| > 0, with e'{K) = de{K)/dn, see Appendix A-I. All K{e) encountered 
in practice can be divided into M + 1 piecewise-continuous segments {(cm, em+i)}m= 0 ’ where K{e) in 
each segment is a differentiable monotonically decreasing function of e. In this case, (5) holds and the 
expression for l(/t;) can be easily extended to this scenario, yielding (see Appendix A-II) 

M 

>■(«) = XI |e'^(K)| (6) 

m=0 

where indicator function that takes value 1 when k G (m^, Vm) and 0 otherwise. The range 

and inverse of K{e) within (€^, 6 ^+ 1 ) are {um,Vm) and respectively, with = infe^e„i+i < 

Vm = sup^Ne^ K{e). 

Observe that the mass-attenuation spectrum L(fi;) is nonnegative for all k\ 

i{k) > 0. (7) 

Due to its nonnegative support and range, l^(s) is a decreasing function of s. Here, s > 0, in contrast with 
the traditional Laplace transform where s is generally complex. The function (l^)“^ converts the noiseless 
measurement in (5), which is a nonlinear function of the density map a{x^y), into a noiseless linear 
“measurement” f^a(x,y) d£. The (l^)“^ o exp(—) mapping corresponds to the linearization function in 
[16] [where it was defined through (3b) rather than the mass-attenuation spectrum] and converts —lnX°“' 
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into a noiseless linear “measurement” J^a{x,y) d£, see also the diseussion in Section I-Bl. 

The mass-attenuation spectrum depends both on the measurement system (through the incident energy 
spectrum) and inspected object (through the mass attenuation of the inspected material). In the blind scenario 
where the inspected material and incident signal spectrum are unknown, the above parameterization allows 
us to estimate two functions [l(k) and a{x,y)] rather than three \_L{e),K{e), and a{x,y)]. This blind 
scenario is the focus of this paper. 

In Section II, we introduce the density-map and mass-attenuation parameters to be estimated and discuss 
their identifiability. Section III then presents lognormal and Poisson probabilistic measurement models and 
biconvexity properties of their negative log-likelihoods (NLLs). Section IV introduces the penalized NLL 
function that incorporates the parameter constraints and describes the proposed block coordinate-descent 
algorithm, which alternates between a Nesterov’s proximal-gradient (NPG) step for estimating the density 
map image and a limited-memory Broyden-Fletcher-Goldfarb-Shanno with box constraints (L-BFGS-B) 
step for estimating the incident-spectrum parameters. In Section V, we show the performance of the 
proposed method using simulated and real X-ray CT data. Concluding remarks are given in Section VI. 

II. Basis-Function Expansion of Mass-Attenuation Spectrum 

Upon spatial-domain discretization into p pixels, replace the integral f^a(x,y) df with 

a(x,y) d£ ^ (8) 

where CKAOisapxl vector representing the 2D image that we wish to reconstruct [i.e., discretized 
a(x, y), see also (7)] and 0AOisapxl vector of known weights quantifying how much each element of 
OL contributes to the X-ray attenuation on the straight-line path £. An X-ray CT scan consists of hundreds 
of projections with the beam intensity measured by thousands of detectors for each projection. Denote 
by N the total number of measurements from all projections collected at the detector array. For the nth 
measurement, define its discretized line integral as stacking all N such integrals into a vector yields 
$q:, where = [0^ 4>2''' 07v]^ ^ is the projection matrix, also called Radon transform matrix in 

a parallel-beam X-ray tomographic imaging system. We call the corresponding transformation, $q:, the 
monochromatic projection of ck. 

Approximate L(fi;) with a linear combination of J (J N) basis functions: 



l(/t:) = b{n)I 


(9a) 
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where 


X^[Xi,X2,...,Xj]^ (9b) 

^ 0 (9c) 

is the J X 1 vector of corresponding basis-function coefficients and the 1 x J row vector function 

b{K) = [bi{K),b2{K),bj{K)] (9d) 

consists of B-splines [30] of order one (termed B1 splines, illustrated in Fig. lb); in this case, the 
decomposition (9a) yields nonnegative elements of the spline coefficients X [based on (7)] and thus allows 
us to impose the physically meaningful nonnegativity constraint (9c) when estimating X. The spline knots 
are selected from a growing geometric series with kq > 0, 

Kj = q’ Kq (10a) 


and common ratio 


g > 1 


(10b) 


which yields the B1-spline basis functions: 


K — Kj-i 


b,{K) = I 

Kj+i - Kj ’ 

0 , 


Kj-i < K < Kj 


Kj < K < Kj+i 

Otherwise 


(10c) 


where the jth basis function can be obtained simply by g-scaling the (j — l)th basis function: 


bj{K) = bj+i {qk) 


(lOd) 


see also Fig. lb. 

The geometric-series knots have a wide span from kq to kj+i and compensate larger k (i.e., higher 
attenuation implying “exponentially” smaller term^) with a “geometrically” wider integral range, 

'while using b''(')I to approximate F)-) = f L(fv)e“''^ dK, to balance the weight to each we prefer {&((•)}/=! that can provide 

similar values over j. This desired property is achieved through the geometric-series knots. 
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that results in a more effective approximation of (5). The common ratio q in (10b) determines the resolution 
of the B1-spline approximation. Here, we select q and J so that the range of k spanning the mass-attenuation 
spectrum is constant: 

^li+1 = g^+i = const. (lOe) 

Kq 

In summary, the following three tuning constants: 




(lOf) 


define our B1-spline basis functions 6(/t). 

Substituting (8) and (9a) into (5) for each of the N measurements yields the following expressions for 
the incident energy and the iV x 1 vector of noiseless measurements: 

r"(X) = b^(0)X (lla) 

X°“'(a,X) = b|:(<I>a)X (lib) 

where, following the notation in Section I, (s) = is an output basis-function matrix obtained 

by stacking the 1 x J vectors b^(<Sn) columnwise, and s = $q: is the monochromatic projection. Since 
the Laplace transform of (10c) can be computed analytically, h^{s) has a closed-form expression. 

A. Ambiguity of the density map and mass-attenuation spectrum 

We discuss density-map scaling ambiguity under the blind scenario where both the density map ol and 
incident spectrum parameters X are unknown. By noting (lOd) and the ^-scaling property of the Laplace 
transform, 6, {qn) A- -h^{s/q) for g > 0, we conclude that selecting q times narrower basis functions 
, &j_i(k)] than those in h{K) and q times larger density map and spectral parameters {qa 
and ql) yields the same mean output photon energy. Consequently, 

X-'(a, [0,X2,... ?[^2,... 0]'^). (12) 

Hence, when X has a leading zero, the noiseless signal output remains the same if we shift the elements 
of X (and correspondingly the mass-attenuation spectrum) to the left, followed by scaling the new X and 
OL by q. Equivalently, when X has a trailing zero, the noiseless signal output remains the same if we shift 
the elements of X (and correspondingly the spectrum) to the right and scale the new X and ck by 1 /g. We 
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refer to this property as the shift ambiguity of the mass-attenuation spectrum, which allows us to rearrange 
leading or trailing zeros in the mass-attenuation coefficient vector I and position the central nonzero part 
of X. 

In Section III, we introduce the lognormal and Poisson measurement models (Sections III-A and III-B) 
and establish conditions for biconvexity of the corresponding NLLs with respect to the density map and 
incident-spectrum parameters (Section III-C). 

B. Rank ofb]^{^ct) and selection of the number of splines J 

If bo($Q:) does not have full column rank, then X is not identifiable even if a. is known, see (11b). The 
estimation of X may be numerically unstable if bo($Q:) is poorly conditioned and has small minimum 
singular values. 

We can think of the noiseless X-ray CT measurements as h^{s)I sampled at different s = G 
[0,maXn((/)^Q:)]. The following lemma implies that, if we could collect all s G [0, a], a > 0 (denoted s), 
then the corresponding b\{s) would be a full-rank matrix. 

Lemma 1: J = Ojxi is the necessary condition for = 0 over the range s G [0,a], where 

JT G and a > 0. 

Proof: Use the fact that b^{s)J' is analytic with respect to s. If b^{s)J' = 0 in the domain s G [0, a], 
the nth order derivative of b^(s) JT with respect to s is zero over this domain for all n = 1, 2,.... Then, 
according to the Taylor expansion of the whole complex plane at some point in [0,a], b^{ )J' = 0 
everywhere. Therefore, b{K)J' = 0, thus, ff = 0. ■ 

If our data collection system can sample over [O, max„(0^Q:)] sufficiently densely, we expect bo(<I>Q:) 
to have full column rank. 

As the number of splines J increases for fixed support [kq, kj+i] [see (lOe)], we achieve better resolution 
of the mass-attenuation spectrum, but bo(<I>Q:) becomes poorly conditioned with its smallest singular values 
approaching zero. To estimate this spectrum well, we wish choose a J that provides both good resolution 
and sufficiently large smallest singular value of 

Fortunately, we focus on the reconstruction of ct, which is affected by X only through the function 
b^{s)I, and b^{s)I is stable as we increase J. Indeed, we observe that, when we choose J that is 
significantly larger than the rank of b]^{^ct), the estimation of cx will be good and b|;($Q:) stable, even 
though the estimation of X is poor due to its non-identifiability. The increase of J will also increase the 
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computational complexity of signal reconstruction under the blind seenario where the mass-attenuation 
speetrum is unknown. 

III. Measurement Models and Their Properties 
A. Lognormal measurement model 

Consider an TV x 1 veetor S of energy measurements eorrupted by independent lognormal noise and 
the eorresponding NLL funetion [see (11b)]: 

C{cx,I) = ^||lno£ -lnoX°'^‘(«,^)||^ (13) 

In the following, we express the NLL (13) as a funetion of a. with I fixed (Seetion III-Al) and viee 
versa (Seetion III-A2), and derive eonditions for its eonvexity under the two seenarios. These results will 
be used to establish the bieonvexity eonditions for this NLL in Seetion III-C and to deseribe our bloek 
eoordinate-deseent estimation algorithm in Seetion IV. 

1) NLL of ex: Reeall (9a) and define 


L|;($a) = b^^i^cx)! (I4a) 

obfained by staeking |l^(0^q:)}^_^ eolumnwise. The NLL of a for fixed X is 

/:,(q:) = ^||lno£ - Ino L^(<I>Q:)||2 (14b) 

whieh eorresponds to the Gaussian generalized linear model (GLM) for measurements lno(£) with design 
matrix <I> and link funetion (Lo)“^ (exp(-)). See [31] for introduetion to GLMs. 

To establish eonvexity of the NLL (14b), we enforee monotonieity of the mass-attenuation speetrum 
l(/t;) in low- and high-zt regions and also assume that the mid-zt region has higher speetrum than the low-zt 
region. Note that we do not require here that l(zt;) satisfies the basis-funetion expansion (9a); however, 
the basis-funetion expansion requirement will be needed to establish the bieonvexity of the NLL in (13). 
Henee, we define the three n regions using the spline parameters (lOf) as well as an additional integer 
eonstant 


jo > [(-1 + l)/2l. 


(15a) 
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In particular, the low-, mid-, and high-K regions are defined as 


^low [^0) ^J+1—Jo. > ^mid — [^"^+1—Jo > ^io. ^high .^JO>^"^+l]' 


(15b) 


Assumption 1: Assume that the mass-attenuation spectrum satisfies 


A 


L: [kq, kj+i] —)■ M+ L non-decreasing and non-increasing in /Ciow and /Chigh, 


and L(fi;) > L(fi;j+i_jQ) V/t G /C 



(16) 


If the basis-function expansion (9a) holds, (16) reduces to 


A= <Ze 


Zl < X2 < . . .< ij+l-jg, Zjg > ... > Xj_l > Xj, 


and Ij > I. 7 + 1 -jo, Vj e [J + 1 - jo, jo] 


(17) 


see also Fig. lb. Here, the monotonic low- and high-K regions each contains X —jo knots in the B 1-spline 
representation, whereas the central region contains 2jo — J knots. 

In practice, the X-ray spectrum 7(e) starts at a lowest effective energy that can penetrate the object, 
vanishes at the tube voltage (the highest photon energy) and has a region in the center higher than the 
two ends, see Fig. la. When the support of 7 (e) is free of Tf-edges, the mass attenuation coefficient k(£) 
is monotonic, thus i(fi;) (as a function of k) has the same shape and trends as t(e) (as a function of e) 
and justifies Assumption 1. If a 7f-edge is present within the support of t(e), it is difficult to infer the 
shape of L(fi;). In most cases. Assumption 1 holds. 

For the approximation of l(k) using B 1-spline basis expansion, as long as [ko,«j+i] is sufficiently 
large to cover the range of k(£) with e e supp(t(e)), we can always meet Assumption 1 by selecting jo 
appropriately. 

Multiple different (o:,X) share the same noiseless output X°“'(q:,X) and thus the same NLL, see 
Section II-A. In particular, equivalent (o:,X) can be constructed by left- or right-shifting the mass 
attenuation spectrum and properly rescaling it and the density map, see (12). Selecting a fixed jo in 
(15a) can exclude all these equivalent values but the one in which the mass-attenuation spectrum satisfies 
(16) and where the biconvexity of the NLL can be established. 

Lemma 2: Provided that Assumption 1 holds, the lognormal NLL Ci{ol) is a convex function of a. over 
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the following region: 



Lq($q:) ^ e a G 


(18a) 


where 




2glo 

(gio _ 1)2- 


(18b) 


Proof: See Appendix B. ■ 

Upon taking the logarithm of both sides and rearranging, the condition in (18a) corresponds to upper- 
bounding the residuals ln£^„ — lnX™‘(Q:,X) by U. The bound U depends only on the common ratio q and 
constant jo used to describe the constraint on i(/t:) in Assumption 1. Note that Lemma 2 does not assume 
a basis-function expansion of the mass-attenuation spectrum, only that it satisfies (16). 

If we wish i{k) to cover the same range [i.e., (lOe) holds], then reducing q needs to be accompanied 
by increasing J, which also leads to a larger jo. Indeed, q^° in U is the ratio of the point where i{K) starts 
to be monotonically decreasing to the point where the support of L(fi;) starts, see Fig. Ib. 

2) NLL of I: Fix ck and define 


A = (I9a) 

The NLL of X for fixed a reduces to a Gaussian GLM for measurements lno(£) with design matrix A 
and exponential link function: 

/:^(X) = ^||lno£-lno(AX)||2. (19b) 

In the following, we first provide the condition for the convexity of £^(X): 

Lemma 3: The NLL Ca{2L) in (19b) is a convex function of X in the following region: 

{X I AX A 65 } . (20) 

Upon taking the logarithm of both sides and rearranging, the condition in (20) corresponds to lnoX°“‘(Q:,X) — 
Ino £ A 1. 
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Proof: Since = A, we have 

^ A^diag“^(AX) [ino(AX) - Ino £] (21a) 

(X) 

dldl^ ^ diag"^(AX) diag(l - Ino(AX) + Ino £) A (21b) 

Aeeording to (20), diag (l — Ino(AX) + Ihq £) AO, the Hessian (21b) is positive semidefinite, and Ca{^) 
is eonvex. ■ 

From the Hessian expression in (21b), we eonelude that Ca{^) in (19b) is strongly eonvex if the 
eondition (20) holds with striet inequality and design matrix A has full rank. 

Combining the eonvexity results in Lemmas 2 and 3 yields the bieonvexity region for the lognormal 
NLL £(q:,X) in (13), see Seetion III-C. 


B. Poisson measurement model 

For an iV X 1 veetor S of independent Poisson measurements, the NLL in the form of generalized 
Kullbaek-Leibler divergenee [32] is [see also (11b)] 

X) = [X™'(a, X) - £] - [Ino X™‘(a, X) - In^ £] . (22) 

In the following, we express (22) as a funetion of a. with X fixed (Seetion III-Bl) and viee versa 
(Seetion III-B2), whieh will be used to deseribe our estimation algorithm in Seetion IV. 

1) NLL of ex: Reeall (9a) and write the NLL of ex for fixed X as 

C, {ex) = 1^ [LL($a) [Ino LL($a) - Ino £] (23) 


whieh eorresponds to the Poisson GLM with design matrix <l> and link funetion equal to the inverse of 


lL(.). 

Lemma 4: Provided that Assumption 1 holds, the Poisson NLL £i(q:) is a eonvex funetion of ex over 
the following region: 


V = lex 


iL($a) A (1 - V)S, ex e 


(24a) 


where 


1.4 


g2io + 1 ■ 


(24b) 


Proof: See Appendix B. 
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Note that the region in (24a) is only suffieient for eonvexity and that Lemma 4 does not assume a 
basis-funetion expansion of the mass-attenuation speetrum, only that it satisfies (16). 

Similar to U,1 — V, the lower bound of X^\cx,I) j8n for all n, is also a funetion of whieh is 
determined by the shape of l(/t;). 

2) NLL of I: For fixed a, the NLL of I reduees to [see (19a)] 

Ca (X) = {AI - S) - [Ino [AX) - Ino S] (25) 

whieh eorresponds to the Poisson GLM with design matrix A in (19a) and identity link. 

Lemma 5: The NLL Ca{X) in (25) is a eonvex funetion of I for all I E 
Proof: The gradient and Hessian of the NLL in (25) are 

- diag"^(AX)£] (26a) 

^ diag(£) diag“2(2lX)2l (26b) 

where the Hessian matrix (26b) is positive semidefinite. Thus, Ca{X) in (25) is eonvex on ■ 

From the Hessian expression in (26b), we eonelude that Ca{X) in (25) is strongly eonvex if the design 
matrix A has full rank. 

Combining the eonvexity results in Lemmas 4 and 5 yields the bieonvexity region for the NLL 
in (22), see the following seetion. 


C. Biconvexity of the NLLs 

Theorem 1: Suppose that Assumption 1 in (17) holds. Then, 

I) the lognormal NLL funetion (13) is bieonvex [33] with respeet to ol and X in the region 


Q 


:«,x) 


.-u. 


S A X™'(a,X) I eA, otE 


(27a) 


whieh bounds the elements of the residual veetor luo S — lnoX™‘(Q:,X) [whose squared (' 2 -norm we 
wish to minimize, see (13)] between —1 and U, and 
II) the Poisson NLL (22) is bieonvex with respeet to a and X in the following set: 


V = |(a,X) |X™‘(a,X) ^ {l-V)S, I E A, cx E } 


(27b) 


whieh bounds j£n from below by 1 — C for all n, see also (24b). 
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Proof: We first show the eonvexity of the regions defined in (27a) and (27b) with respeet to eaeh 
variable (a and X) with the other fixed. We then show the eonvexity of the NLL funetions (13) and (22) 
for eaeh variable. 

The region A in (17) is a subspaee, thus a eonvex set. Sinee X™‘ in (11b) is a linear funetion of X, the 
inequalities eomparing X°“‘ to eonstants speeify a eonvex set. Therefore, both & Q} and 

Va = {XI (q:,X) g X} are eonvex for fixed cx G M+ beeause both are interseetions between the subspaee 
A and a eonvex set via X°“^ Sinee hj{K) > 0, dn are deereasing funetions 

of s, whieh, together with the faet that X ^ 0, implies that 6^(s)X is a deereasing funetion of s. Sinee 
the linear transform $ 0 : preserves eonvexity, both Qi = {ol\ (q:,X) G and Vx = {ck | (q:,X) G V} 
are eonvex with respeet to cx for fixed X G Therefore, Q and V are bieonvex with respeet to X and cx. 

Observe that Q in (27a) is the interseetion of the regions speeified by Assumption 1 and Lemmas 2 
and 3. Thus, within Q, the lognormal NLL (13) is a eonvex funetion of cx (with fixed X) and X (with 
fixed ck), respeetively. Similarly, V in (27b) is the interseetion of the regions speeified by Assumption 1 
and Lemmas 4 and 5. Thus, within V, the Poisson NLL (22) is a eonvex funetion of cx (with fixed X) 
and X (with fixed ck), respeetively. 

By eombining the above region and funetion eonvexity results, we eonelude that (13) and (22) are 
bieonvex within Q and V, respeetively. ■ 

IV. Density-Map and Mass-Attenuation Spectrum Estimation 
A. Penalized NLL objective function and its properties 

Our goal is to eompute penalized maximum-likelihood estimates of the density-map and mass-attenuation 
speetrum parameters (q:,X) by solving the following minimization problem: 

min/(Q:,X) (28a) 

OL,X 

where 

f{cx,I) = C{cx, X) -f urjcx) + I[o,+oo)(X) (28b) 

is the penalized NLL objeetive funetion, m > 0 is a sealar tuning eonstant, the density-map regularization 
term r(Q:) enforees nonnegativity and sparsity of the signal cx in an appropriate transform [e.g., total- 
variation (TV) or diserete wavelet transform (DWT)] domain, and the third summand enforees the 
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nonnegativity of the mass-attenuation speetrum parameters X, see (9e). We eonsider the following two 

r(Q:): 

P I - 

+V+oo)(Q^) (29a) 

i=i y je^^i 

r{cx) = ||^^a||^ + I[o,+oo)(a) (29b) 

where the seeond summands enforee the nonnegativity of ct. The first summand in (29a) is an isotropie 
TV-domain sparsity term [34, 35] and A/) is index set of neighbors of the ith element of ol, where elements 
of OL are arranged to form a 2-dimensional image. Eaeh set Mi eonsists of two pixels at most, with one on 
the top and the other on the right of the ith pixel, if possible [35]. The first summand in (29b) imposes 
sparsity of transform-domain density-map eoeffieients 4 /^q:, where the sparsifying dietionary tl' G W^p' 
(p < p') is assumed to satisfy the orthogonality eondition 

= Ip. (30) 

In [7], we used the sparsity regularization term (29b) with (30) and the lognormal NLL in (13). 

Sinee both r{ct) in (29a) and (29b) are eonvex for any ct, and I[o,+oo)(2i) in (28b) is eonvex for all X, 
the following holds. 

Corollary 1: The objeetive f{cx,I) in (28b) is bieonvex with respeet to ct and X under the eonditions 
speeified by Theorem 1. 

Although the NLL may have multiple loeal minima of the form q'^ot (see Seetion ITA), those with 
large n can be eliminated by the regularization penalty. To make it elear, let us first see the impaet the 
ambiguity has on the sealing of the first derivative of the objeetive funetion f{ct,I). Lrom (12), we 
eonelude that Zq = (cc, [0,X2,... ,Xj]^) and Zi = (get, q[l 2 ,... ,Xj, 0]^) have the same X°“'(q:,X) and 
thus the same NLL. The first derivative of C{ct,I) = C{z) over cc at zi is 1/g times that at zq. Meanwhile, 
the subgradients of the regularization term at Zq and Zi with respeet to ol are the same. Henee, with 
the inerease of n, the slope of f{ct,I) is dominated by the penalty term. This is also experimentally 
eonfirmed: we see the initialization q'^ct^^^ with large n being redueed as the iteration proeeeds. 

We now establish that the objeetive funetion (28b) satisfies the Kurdyka-Lojasiewiez (KL) property [36], 
whieh is important for establishing loeal eonvergenee of bloek-eoordinate sehemes in bieonvex optimization 
problems. 
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Theorem 2 (KL Property): The objective function satisfies the KL property in any compact 

subset C C dom(/). Here, the NLL C{cx,I) is either lognormal or Poisson in (13) or (22), respectively, 
and the penalty r(Q:) in (28b) is given in (29a) or (29b). 

Proof: See Appendix C. ■ 

Note that the domain of / requires positive measurements such that X™‘(q:,X) >- 0, which excludes 
the case X = 0 when no incident X-ray is applied, see also (11b). The compact set assumption keeps the 
distance between X and 0 from going to zero. 


B. Minimization algorithm 

The parameters that we wish to estimate are naturally divided into two blocks, cx and X. The large 
size of ct prohibits effective second-order methods under the sparsity regularization, whereas X has much 
smaller size and only nonnegative constraints, thus allowing for more sophisticated solvers, such as the 
quasi-Newton Broyden-Fletcher-Goldfarb-Shanno (BFGS) approach [37, Sec. 4.3.3.4] that we adopt here. 
In addition, the scaling difference between ct and X can be significant so that the joint gradient method 
for Ct and X together would converge slowly. Therefore, we adopt a block coordinate-descent algorithm 
to minimize f{a,X) in (28b), where the NPG [7] and L-BFGS-B [38] methods are employed to update 
estimates of the density-map and mass-attenuation spectrum parameters, respectively. The choice of block 
coordinate-descent optimization is also motivated by the related alternate convex search (ACS) and block 
coordinate-descent schemes in [33] and [39], respectively, both with convergence guarantees under certain 
conditions. 

We minimize the objective function (28b) by alternatively updating ct and X using Step 1) and Step 
2), respectively, where Iteration i proceeds as follows: 

Step 1) (NPG) Set the mass-attenuation spectrum l(/t:) = treat it as known^, and descend 

the regularized NLL function /(q:,X*^*“^^) = Cfot) + ur{ot) by applying an NPG step for cx, 
which yields 


0(9 

1 

“ 2 

1-1 

+ 


= cx^‘ 

i-i) 


= arg min; 


(*-i) _ X 


0(9 


cx 


(i-l) _ Q,(i-2) 


') 


2 / 3(9 


CK 


- + /3«V£,(«(*)) II 2 + uricx) 


^This selection corresponds to Ci{ot) = ^^), see also (14b) and (23). 


(31a) 

(31b) 

(31c) 
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where > 0 is an adaptive step size chosen to satisfy the majorization condition: 

+ (a« -a«)'^VA(a«) + ^||a« -a«||2 (31d) 

using a simple adaptation scheme that aims at keeping as large as possible, as long as 
it satisfies (3 Id), see Section IV-B3 for details; we also apply the “function restart” [40] to 
restore the monotonicity and improve convergence of NPG steps. 

Step 2) (BFGS) Set the design matrix A = treat it as known^, and minimize the regularized 

NLL function /(q:*^*\X) with respect to X, i.e., 

X*^*^ = argmn£^(X) (32) 

using the inner L-BFGS-B iteration, initialized by 

We refer to this iteration as the NPG-BFGS algorithm. 

The above iteration is the first physical-model based image reconstruction method (in addition to our 
preliminary work in [6]) for simultaneous blind (assuming unknown incident X-ray spectrum and unknown 
materials) sparse image reconstruction from polychromatic measurements. In [6], we applied a piecewise- 
constant (BO spline) expansion of the mass attenuation spectrum, approximated Laplace integrals with 
Riemann sums, used a smooth approximation of the nonnegativity penalties in (29) instead of exact, and 
did not employ signal-sparsity regularization. 

The optimization task in (31c) is a proximal step: 

q;(*) = prox^{i)„,, j (33a) 

where the proximal operator for scaled (by A > 0) regularization term (29) is defined as [41] 

prox;^^ (a) = argmin -||q: — a ||2 -f Ar(Q:). (33b) 

a 2 

If we do not apply the Nesterov’s acceleration (31a)-(31b) and use only the proximal-gradient (PG) step 
(31c) to update the density-map iterates cx, i.e., 

^(9 ^ ( 34 ) 


^This selection corresponds to CAitL) = , see also (19b) and (25). 
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then the corresponding iteration is the PG-BFGS algorithm, see also the numerical examples and Fig. 8b. 
We show that our PG-BFGS algorithm is monotonic (Section IV-B3) and converges to one of the critical 
points of the objective (Section IV-B5). Unfortunately, such theoretical convergence properties do 

not carry over to NPG-BFGS iteration, yet this iteration empirically outperforms the PG-BFGS method, 
see Section V. 

If the mass-attenuation spectrum l(/t;) is known and we iterate Step 1) only to estimate the density map 
ct, we refer to this iteration as the NPG algorithm for ct. 

Step 1) employs only one NPG step, where an inner iteration is needed to compute the proximal step 
(31c), whereas Step 2) employs a full L-BFGS-B iteration. Hence, both Step 1) and Step 2) require inner 
iterations. To solve (33b) for the two regularizations, we adopt 

• TV-based denoising method in [35, Sec. IV] for the TV regularization (29a) and 

• the alternating direction method of multipliers (ADMM) method for the DWT-based regularization 
(29b), described in the following section. 

1) Proximal Mapping using ADMM for DWT-based regularization: We now present an ADMM scheme 
for computing the proximal operator in (33b) for the DWT-based regularization (29b) (see [42, Appendix C]): 

(^a + (35a) 

gik+i) ^ (35b) 

^(fc+i) ^ ^ ^{fc+i) _ xi)T^{k+i) ^25 c) 

where k is the inner-iteration index and p > 0 is a step-size parameter, usually set to 1 [43, Sec. 11]. We 
obtain (35) by decomposing the proximal objective function (33b) into the sum of \ ||q: — a ||2 -|-I[o,+oo)(ck) 
and A||4 /^q:||i. The above algorithm is initialized by 

s(o) = and = Opxi (36) 

which is equivalent to projecting the noisy signal a onto the nonnegative signal space = (a-)_|_. Note that 
the signal estimate is within the signal space and yields finite objective function ^\\cx—a\\ 2 +\r{a.) 

that we wish to minimize; we use it to obtain the final signal estimate upon convergence of the ADMM 
iteration. 
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In the following, we describe the convergence criteria for the outer and inner iterations (Section IV-B2), 
adaptive step-size selection (Section IV-B3), algorithm initialization (Section IV-B4), and convergence 
analysis (Section IV-B5). 

2) Convergence criteria: Define the measures of change of the density map ol and the NLL 







(37a) 

(37b) 


upon completion of Step 1) in Iteration i. We run the outer iteration between Step 1) and Step 2) until 
the relative distance of consecutive iterates of the density map ol does not change significantly: 




< e CK 


(*)| 


(38) 


where e > 0 is the convergence threshold. 

Proximal step (31c). The convergence criteria for the inner TV-denoising and ADMM iterations used 
to solve (31c) in Iteration i are based on the relative change 

||a(*’^) - II 2 < (39a) 

max|||s(*’'=) - (39b) 

where k are the inner-iteration indices and the convergence tuning constant rja € (0,1) is chosen to trade 
off the accuracy and speed of the inner iterations and provide sufficiently accurate solutions to (31c). Note 
that in (39b) is the dual variable in ADMM that converges to and the criterion is on both 

the primal residual ||s0:fc) _ and the dual residual ||s0:fc) _ [43, Sec. 3.3]. 

BFGS step (32). For Step 2) in Iteration i, we set the convergence criterion as 

|/:^(X0’^)) - £^(x0’^-9)| < ^^49 (40) 

where the tuning constant px G (0,1) trades off the accuracy and speed of the estimation of X. Since 
our main goal is to estimate the density map ol, the benefit to Step 1) provided by Step 2) through the 
minimization of Ca{2C) in (32) is more important than the reconstruction of X itself; furthermore, the 
estimation of X is sensitive to the rank of A. Hence, we select 4^ as a convergence metric for the inner 
iteration for X in Step 2) of Iteration i. 
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We safeguard both the proximal and BFGS steps by setting the maximum number of iteration limit nsub- 
In summary, the outer and inner convergence tuning constants are (e, ?7 q,, ?7 x, ^^sub)- The default values of 
these tuning constants are 

(e, Vx: nsub) = (l0-^ 20) (41) 

adopted in all numerical examples in this paper. 

3) Adaptive step-size selection in Step 1): We adopt an adaptive step size selection strategy to 
accommodate the scenarios when the local Lipschitz constant varies as the algorithm evolves. 

We select the step size to satisfy the majorization condition (3Id) using the following adaptation 
scheme [42, 44]: 

a) In Iteration i: if there has been no step size reductions for n consecutive iterations, i.e., = 

^(*- 2 ) ^ ... ^ ^ larger step size where ^ e ( 0 , 1 ) is a step-size 

adaptation parameter; otherwise start with 

b) Backtrack using the same multiplicative scaling constant with goal to find the largest that 
satisfies (3 Id). 

We select the initial step size using the Barzilai-Borwein method [45]. 

4) Initialization: Initialize the density-map and mass-attenuation coefficient vector iterates as 


Q;(0) = oS = SpBP, 


= 0 


X(o) = 


maxn £ri 
^r{j+i)/2i (0) 


er(^+i)/2i 


(42a) 

(42b) 


where ckfbp is the standard FBP reconstruction [10, Ch. 3], [5, Sec. 3.5]. We select the initial step 
size using the Barzilai-Borwein method [45]. Plugging the initialization (42b) into (11b) yields 
and the initial estimate corresponds approximately to a monochromatic 

X-ray model; more precisely, it is a polychromatic X-ray model with a narrow mass attenuation spectrum 
proportional to 6 |-(j+i)/ 2 ] (k). It is also desirable to have the main lobe of the estimated spectrum at the 
center, which is why the nonzero element of is placed in the middle position. 

5) Convergence Analysis of PG-BFGS Iteration: We analyze the convergence of the PG-BFGS iteration 
using arguments similar to those in [39]. Although NPG-BFGS converges faster than PG-BFGS empirically, 
it is not easy to analyze its convergence due to NPG’s Nesterov’s acceleration step and adaptive step size. 
In this section, we denote the sequence of PG-BFGS iterates by | 
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The following lemma establishes the monotonieity of the PG-BFGS iteration for step sizes that 
satisfy the majorization eondition, whieh ineludes the above step-size seleetion as well. 

Lemma 6: For a sequenee { of PG-BFGS iterates with step size satisfying the 

majorization eondition (3Id), the objeetive funetion in (28b) is monotonieally non-inereasing: 

(43) 


for all i under the eonditions speeified by Theorem 1. 

Proof: Due to Theorem 1, /(ck, X*-*"^^) is a eonvex funetion of ck; henee, we ean apply [46, Lemma 2.3] 
to obtain 


/(a,X(*-i))-/(a«,X(*-i)) > 


2/3W L 


11 (i 

1 - (^ 2 ) 1 1 

|| q :^ 

— ^11 


+ 2(a 


7v(*) _ 


a. 




(44a) 


for any ex, as long as (3 Id) is satisfied. In the PG-BFGS iteration, the momentum term in (31b) is zero, 
i.e., (34) holds. Plug in (34) and cx = into (44a): 


- /(a«,X(*-')) > > 0 (44b) 

and (43) follows by using the faet see Step 2). ■ 

Unfortunately, the monotonieity does not earry over when Nesterov aeeeleration is employed and the 
momentum term is not zero; this ease requires further adjustments sueh as restart, as deseribed in Step 1). 

Sinee our f{<x,I) are lower bounded [whieh is easy to argue, see Appendix C], the sequenee /(q:*^*\X*^*^) 
eonverges. It is also easy to eonelude that the sequenee is Cauehy by showing 

< +oo; thus eonverges to a eritieal point (also ealled stationary point) ct* [36, Lemma 2.2]. 

A better result ~ ^*'*'*'^^112 ^ [39] ean be established if we ean show that f{<x,I) satisfies 

the KL property [36], whieh regularizes the (sub)gradient of a funetion through its value at eertain point or 
the whole domain and also ensures the steepness of the funetion around the optimum so that the length of 
the gradient trajeetory is bounded. The KL property has been first used in [36] to establish the eritieal-point 
eonvergenee for an alternating proximal-minimization method, whieh is then extended by [39] to the more 
general bloek eoordinate-deseent method. 

Now, we ean make the following elaim on the eonvergenee of PG-BFGS iteration. 

Theorem 3: Consider the sequenee { of PG-BFGS iterates, with step size satisfying 

the majorization eondition (3 Id). Assume that 
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1) there exist positive /3+ > /3_ > 0 such that E [f3-,f3+] for all i, 

2) C{cx,I) is a strong convex function of X, and 

3) the first derivative of f{cx,I) over {cx,T) is Lipschitz continuous 

and that the conditions of Theorem 1 hold, ensuring the biconvexity of /. Then, converges to 

one of the critical points (a.*,!*) of f{cx,I) and 

OO CXD 

<+00, ^|lx(*+i)-X«l|2 <+00 (45) 

i=l i=\ 

Proof: See Appendix C. ■ 

The conditions for strong convexity of £(q:,X) as a function of a. are discussed in Sections III-A2 
and III-B2, see also Section II-B. The KL property can provide guarantees on the convergence rate under 
additional assumptions, see [36, Theorem 3.4]. The convergence properties of NPG-BFGS are of great 
interest because NPG-BFGS converges faster than PG-BFGS; establishing these properties is left as future 
work. 


V. Numerical Examples 

We now evaluate our proposed algorithm by both numerical simulations and real data reconstructions. 
Here, we focus on the Poisson measurement model only and refer to [7] for the lognormal case. 

We construct the fan-beam X-ray projection transform matrix $ and its adjoint operator directly 
on GPU with full circular mask [47], and the multi-thread version on CPU is also available; see https: 
//github.com/isucsp/imgRecSrc. 

Before applying the reconstruction algorithms, the measurements S are normalized by division of their 
maximum such that max„ = 1, which stabilizes the magnitude of NLL by scaling it with a constant, 
see (22). We set the B 1-spline tuning constants (lOf) to satisfy [see also (lOe)] 

q-^ = 10^ «ro.5(j-hi)l = 1, J = 30 (46a) 

which ensure sufficient coverage (three orders of magnitude) and resolution (30 basis functions) of the 
basis-function representation of the mass-attenuation spectrum and centering its support around 1. We set 
the convergence and adaptive step-size tuning constants for the NPG-BFGS method as 


(e, lix, nsub) = (l0-^ 10-3,10-2, 20), (n, 0 = (4,0.5) 


(46b) 
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Figure 2: (a) Density-map image and (b) mass attenuation and ineident X-ray speetrum as funetions of 
the photon energy e. 

limit the number of outer iterations to 4000 at most. 

A. Reconstruction from simulated Poisson measurements 

Consider reeonstruetion of the 512 x 512 image in Fig. 2a of an iron objeet with density map CKtme- 
We generated a fan-beam polyehromatie sinogram, with distanee from X-ray souree to the rotation eenter 
equal to 2000 times the pixel size, using the interpolated mass attenuation K{e) of iron [48] and ineident 
speetrum L{e) from tungsten anode X-ray tubes at 140 keV with 5 % relative voltage ripple [49], see Fig. 2b. 
The mass-attenuation speetrum l(k) is eonstrueted by eombining K,{e) and L{e), see [6] and [7, Fig. le]. 
Our simulated approximation of the noiseless measurements uses 130 equi-spaeed diseretization points 
over the range 20keV to 140 keV. We simulated independent Poisson measurements {Sn)n=i with means 
(Ji£n)n=i — X°“‘(q:,X). We mimic real X-ray CT system calibration by scaling the projection matrix 
and the spectrum i(e) so that the maximum and minimum of the noiseless measurements (EPn)n=i 
2^® and 20, respectively. Here, the scales of 4> and i{e) correspond to the real size that each image pixel 
represents and the current of the electrons hitting the tungsten anode as well as the overall scanning time. 

Our goal here is to reconstruct a density map of size 512 x 512 by the measurements from an energy- 
integrating detector array of size 512 for each projection. 

Since the true density map is known, we adopt relative square error (RSE) as the main metric to assess 
the performance of the compared algorithms: 



(47) 


where CKtme and ol are the true and reconstructed signals, respectively. Note that (47) is invariant to scaling 
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CK by a nonzero constant, which is needed because the magnitude level of a is not identifiable due to the 
ambiguity of the density map and mass attenuation spectrum (see also Section. II-A). 

We compare 

• the traditional FBP method without [10, Ch. 3] and with linearization [16], i.e., based on the ‘data’ 

y = —lno(£) (without linearization) (48a) 

(with linearization) (48b) 


respectively, 

• linearized basis pursuit denoising (BPDN) in which we apply the NPG approach to solve the BPDN 

problem [35]: 

min-||y — $q :||2 + mV(q:) (49) 

Q. 2 

where y are the linearized measurements by (48b) and the penalty r(Q:) has been defined in (29). 

• our 

- NPG-BFGS alternating descent method, 

- NPG for known mass attenuation spectrum i(fi;). 

with the Matlab implementation available at https://github.com/isucsp/imgRecSrc. 

For all methods that use sparsity and nonnegativity regularization (NPG-BFGS, NPG, and linearized 
BPDN) the regularization constants u and u' have been tuned manually for best RSE performance. All 
iterative algorithms employ the convergence criterion (38) with threshold e = 10“® and the maximum 
number of iterations set to 4000. We initialize iterative reconstruction schemes with or without linearization 
using the corresponding FBP reconstructions. 

Here, the non-blind linearized FBP, NPG, and linearized BPDN methods assume known L(fi;) [i.e., known 
incident spectrum of the X-ray machine and mass attenuation (material)], which we computed using (9a) 
with I equal to the exact sampled i(fi;) and J = 100 spline basis functions spanning three orders of 
magnitude. The FBP and NPG-BFGS methods are blind and do not assume knowledge of l(/t;); FBP 
ignores the polychromatic source effects whereas NPG-BFGS corrects blindly for these effects. 

Fig. 3 shows the average RSEs (over 5 Poisson noise realizations) of different methods as functions 
of the number of fan-beam projections in the range from 0° to 359°. RSEs of the methods that do not 
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40 80 120 160 200 240 280 320 360 
# of Projections 

Figure 3: Average RSEs as funetions of the number of projeetions. 

assume knowledge of the mass attenuation speetrum i(/t) are shown using solid lines whereas dashed 
lines represent methods that assume known L(/t) [i.e., known ineident speetrum of the X-ray maehine and 
mass attenuation (material)]. Blue eolor presents methods that do not employ regularization; red eolor 
presents methods that employ regularization. 

FBP ignores the polyehromatie nature of the measurements and eonsequently performs poorly and does 
not improve as the number of projeetions inereases. Linearized FBP, whieh assumes perfeet knowledge 
of the mass attenuation speetrum, performs mueh better than FBP as shown in Fig. 3. Thanks to the 
nonnegativity and sparsity that it imposes, linearized BPDN aehieves up to 20 times smaller RSE than 
linearized EBP. However, the linearization proeess employed by linearized BPDN enhanees the noise (due 
to the zero-foreing nature of linearization) and does not aeeount for the Poisson measurements model. 
Note that linearized BPDN exhibits a noise floor as the number of projeetions inereases. 

As expeeted, NPG is slightly better than NPG-BEGS beeause it assumes knowledge of !(«:). NPG and 
NPG-BEGS attain RSEs that are 24 % to 37 % that of linearized BPDN, whieh ean be attributed to optimal 
statistieal proeessing of the proposed methods, in eontrast with suboptimal linearization. RSEs of NPG 
and NPG-BEGS reach a noise floor when the number of projections increases beyond 180. 

Eig. 4 shows the reconstructions from 60 equi-spaced fan-beam projections with spacing 6°."^ The EBP 
reconstruction in Pig. 4a is corrupted by both aliasing and beam-hardening (cupping and streaking) artifacts. 
The intensity is high along the outside-towards boundary, the “eye” and the non-convex “abdomen” area 
of the “fish” has positive components, and the ball above the “tail” has a streak artifact connecting to 
the “fish” head. Linearized EBP removes the beam-hardening artifacts, but retains the aliasing artifacts 

''since the density-map reconstruction can only have nonnegative elements, we show the reconstructions in a color-map ranging from 0 to 
its maximum estimated pixel value, which effectively does the negative truncation for the (linearized) FBP methods where the nonnegativity 
signal constraints are not enforced. 
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(d) NPG-BFGS (e) NPG (known l(k)) 


Figure 4: Reconstructions from 60 projections. 


and enhances the noise due to the zero-forcing nature of linearization, see Fig. 4b. Upon enforcing the 
nonnegativity and sparsity constraints, linearized BPDN removes the negative components and achieves a 
smooth reconstruction with a 0.55 % RSE. Thanks to the superiority of the proposed model that accounts 
for both the polychromatic X-ray and Poisson noise in the measurements, NPG-BFGS and NPG achieve 
the best reconstructions, see Fig. 4e and 4d. 

Figs. 5a and 5b show the prohles of each reconstruction at the 250th column indicated by the red line 
in Fig. 4a. In Fig. 5c, we show the scatter plots with 1000 randomly selected points representing FBP 
and NPG-BFGS reconstructions of the C-II object from 60 fan-beam projections. Denote by (ck,X) the 
estimate of (q:,X) obtained upon convergence of the NPG-BFGS iteration. The ^/-coordinates in the scatter 
plots in Fig. 5c are the noisy measurements in log scale —In Sn and the corresponding x-coordinates are the 
monochromatic projections ^^Sfbp (red) and (green) of the estimated density maps; —ln[6^(-)X] 
is the inverse linearization function that maps monochromatic projections to fitted noiseless polychromatic 
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(a) (b) 



Monochromatic projections 


(c) 

Figure 5: (a)-(b) The 250th-column profiles of the reconstructions, and (c) the polychromatic measurements 
as function of the monochromatic projections and corresponding fitted curves. 


projections —lnX°“‘(cK,X). The vertical-direction differences between the NPG-BFGS scatter plot and 
the corresponding linearization curve show goodness of fit between the measurements and our model. 
Since FBP assumes linear relation between —lnoX°“' and $ 0 :, its scatter plot (red) can be fitted by a 
straight line y = x, as shown in Fig. 5c. A few points in the FBP scatter plot with lnX„ = 0 and 
positive monochromatic projections indicate severe streaking artifacts. Observe relatively large residuals 
with bias, which remain even if more sophisticated linear models, e.g., iterative algorithms with sparsity and 
nonnegativity constraints, were adopted, thereby necessitating the need for accouting for the polychromatic 
source; see the numerical examples in [7]. 

Note that the nonnegativity constraints on cx are especially effective in the process of getting good 
estimation of X. Without these constraints, it is even impossible for Step 1) and Step 2) to converge to 
reasonable place. 
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B. X-ray CT Reconstruction from Real Data 

We compare the NPG-BFGS and linear FBP methods by applying them to reconstruct two industrial 
objects containing defects, labeled C-I and C-II, from real fan-beam projections. Here, NPG-BFGS 
achieves visually good reconstructions for u = 10“^, presented in Fig. 6. 

The C-I data set consists of 360 equi-spaced fan-beam projections with 1° separation collected using 
an array of 694 detectors, with distance of X-ray source to the rotation center equal to 3492 times the 
detector size. Figs. 6a and 6b show 512 x 512 density-map image reconstructions of object C-I using 
the FBP and NPG-BFGS methods, respectively. The linear FBP reconstruction, which does not account 
for the polychromatic nature of the X-ray source, suffers from severe streaking (shading that occupies 
the empty area) and cupping (high intensity along the object’s border) artifacts whereas the NPG-BFGS 
reconstruction removes these artifacts thanks to accounting for the polychromatic X-ray source. 

The C-II data set consists of 360 equi-spaced fan-beam projections with 1° separation collected using 
an array of 1380 detectors, with distance of X-ray source to the rotation center equal to 8696 times the 
detector size. Figs. 6c and 6d show 1024 x 1024 density-map image reconstructions of object C-I I by 
the FBP and NPG-BFGS methods, respectively. The NPG-BFGS reconstruction removes the streaking and 
cupping artifacts exhibited by the linear FBP, with enhanced contrast for both the inner defects and object 
boundary. Figs. 6e and 6f show the FBP and NPG-BFGS reconstructions from downsampled C-II data set 
with 120 equi-spaced fan-beam projections with 3° separation. The FBP reconstruction in Fig. 6e exhibits 
both beam-hardening and aliasing artifacts. In contrast, the NPG-BFGS reconstruction in Fig. 6f does not 
exhibit these artifacts because it accounts for the polychromatic X-ray source and employs signal-sparsity 
regularization in (29). Indeed, if we reduce the regularization constant u sufficiently, the aliasing effect 
will occur in the NPG-BFGS reconstruction in Fig. 6f as well. 

Fig. 7 shows the reconstruction profiles of the 337th and 531th rows highlighted by the red horizontal 
lines across Figs. 6c and 6d. Noise in the NPG-BFGS reconstructions can be reduced by increasing the 
regularization parameter u: Figs. 7c and 7d show the corresponding NPG-BFGS reconstruction profiles 
for u = 10“"^, which is 10 times that in Figs. 7a and 7b. 

Observe that the NPG-BFGS reconstructions of C-I and C-II have higher contrast around the inner 
region where cracks reside. Indeed, our reconstructions have slightly higher intensity in the center, which 
is likely due to the detector saturation that lead to measurement truncation; other possible causes could 
be scattering or noise model mismatch. (We have replicated this slight non-uniformity by applying our 
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(c) FBP (d) NPG-BFGS 



(e) FBP (120 projections) (f) NPG-BFGS (120 projections) 


Figure 6: C-I and C-II object reconstructions from fan-beam projections using the FBP and NPG-BFGS 
methods. 
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(a) 337th row (b) 531st row 




(c) 337th row (d) 531st row 

Figure 7: C-II object reconstruction profiles with NPG-BFGS reconstructions using the regularization 
tuning constant (a)-(b) u = 10“® and (c)-(d) u = 


reconstruction to simulated truncated measurements.) This effect is visible in C-I reconstruction in Fig. 6b 
and is barely visible in the C-II reconstruction in Fig. 6d, but can be observed in the profiles in Fig. 7. 
We leave further verification of causes and potential correction of this problem to future work and note 
that this issue does not occur in simulated-data examples that we constructed, see Section V-A. 

In Fig. 8a, we show the scatter plots with 1000 randomly selected points representing FBP and NPG- 
BFGS reconstructions of the C-II object from 360 projections. A few points in the FBP scatter plot 
with ln£’„ = 0 and positive monochromatic projections indicate severe streaking artifacts, which we also 
observed in the simulated example in Section V-A, see Fig. 5c. 

We now illustrate the advantage of using Nesterov’s acceleration in Step 1) of our iteration. Fig. 8b 
shows the objective as a function of outer iteration index i for the NPG-BFGS and PG-BFGS 

methods applied to C-I I reconstruction from 360 projections. Thanks to the Nesterov’s acceleration (31b), 
NPG-BFGS is at least 10 times faster than PG-BFGS, which runs until it reaches the maximum-iteration 
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Monochromatic projections 



# of Iterations 


(a) (b) 

Figure 8: (a) Polychromatic measurements as functions of monochromatic projections and corresponding 
fitted inverse linearization curves, and (b) the objective function as a function of the iteration index i. 


limit. 


VI. Conclusion 

We developed a model that requires no more information than the conventional FBP method but is capable 
of beam-hardening artifact correction. The proposed model, based on the separability of the attenuation, 
combines the variations of the mass attenuation and X-ray spectrum to mass-attenuation spectrum. This 
model solves the single material scenario and convert it to a bi-convex optimization problem. We proposed 
an alternatively descent iterative algorithm to find the solution. We proved the KL properties of the object 
function under both Gaussian and Poisson noise scenarios, and gave primitive conditions that guarantees 
the bi-convexity. Numerical experiments on both simulated and real X-ray CT data are presented. Our blind 
method for sparse X-ray CT reconstruction (i) does not rely on knowledge of the X-ray source spectrum 
and the inspected material (i.e., its mass-attenuation function), (ii) matches or outperforms non-blind 
linearization methods that assume perfect knowledge of the X-ray source and material properties. This 
method is based on our new parsimonious model for polychromatic X-ray CT that exploits separability of 
attenuation of the inspected object. It is of interest to generalize this model beyond X-ray CT so that it 
applies to other physical models and measurement scenarios that have separable structure; here, we would 
construct the most general possible formulation of “separable structure”. For example, our probabilistic 
model in this paper is closely related to GLMs in statistics [31], and can be viewed as their regularized 
(nonnegativity- and sparsity-enforcing) generalization as well. 

The underlying optimization problem for performing our blind sparse signal reconstruction is biconvex 
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with respect to the density map and mass density spectrum parameters; solving and analyzing hi- and 
multiconvex problems is of great general interest in optimization theory, see [39] and references therein. 
An interesting conjecture is that the general version of (ii) holds as a consequence of the separability in 
the measurement model. 

Future work will also include extending our parsimonious mass-attenuation parameterization to multiple 
materials and developing corresponding reconstruction algorithms. 

Appendix A 

Derivation oe the Mass-Attenuation Parameterization 

Observe that the mass attenuation K{e) and incident energy density t(e) are both functions of e, see 
Fig. la. Thus, to combine the variation of these two functions and reduce the degree of freedom, we 
rewrite i{e) as a function of k and set k as the integral variable. 

All mass-attenuation functions K{e) encountered in practice can be divided into piecewise-continuous 
segments, where each segment is a differentiable monotonically decreasing function of e, see [48, Tables 
3 and 4] and [50, Sec. 2.3]. The points of discontinuity in K{e) are referred to as A'-edges and are caused 
by the interaction between photons and K shell electrons, which occurs only when e reaches the binding 
energy of the K shell electron. One example in Fig. 9 is the mass attenuation coefficient curve of iron 
with a single A'-edge at 7.11 keV. 

If the energy range used in a CT scan does not contain any of the A'-edges, K{e) becomes monotonic 
and invertible, which we consider first in Appendix A-I. This simple case, depicted in Fig. la, gives the 
main idea behind our simplified model. 

In Appendix A-II, we then consider the general case where the energy range used in a CT scan contains 
A'-edges. 

I Invertible n{e) 

Consider first the simple case where K{e) is monotonically decreasing and invertible, as depicted in 
Fig. la, and define the differentiable inverse of K{e) as e{K). The change of variables e = e{K) in the 
integral expressions (3a) and (3b) yields 

X'" = j t(e(K))|e'(fi;)| dfi; (Ala) 


(Alb) 
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10-2 L-^^^ 

10 ° 10 ^ 10 ^ 10 ° 10 ^ 
e/keV 

Figure 9: The mass attenuation coefficients k of iron versus the photon energy e with a iT-edge at 7.11 keV. 


Now, we define 


!(£(/.))|e'(/.)I 


(A2) 


with e'{K.) = de{K)/ d/? and obtain (5) by plugging (A2) into (Al). 

As shown in Fig. la, the area L{ej)Aej depicting the X-ray energy within the Asj slot is the same as 
the corresponding area L{Kj)AKj, the amount of X-ray energy that attenuated at a rate within Akj slot. 
In the following, we extend the above results to the case where K{e) is not monotonic. 

II Piecewise-continuous K{e) with monotonically decreasing invertible segments 

Define the domain E of e and a sequence of disjoint intervals {{em, em+i)}m=o Cq = min(E) and 
cm+i = max(E), such that in each interval K{e) is invertible and differentiable. Here, E is the support set 
of the incident X-ray spectrum i{e) and (em)m=i the M iT-edges in E. Taking Fig. 9 as an example, 
there is only one iT-edges at ei given the incident spectrum has its support as ( 60 , 62 ). 

The range and inverse of K{e) within ( 6 m, 6 m+i) are {um,Vm) and EmiK-), respectively, with Um = 
infe^e™+i — sup^^g^ k{£). Then, the noiseless measurement in (3b) can be written as 



(A3a) 


tVl nt) 

r 

^—n ^ 


M 


e 


— K, 


f a(x,y) 


(A3b) 


m=0 



M 


(A3c) 


m=0 


and (5) and ( 6 ) follow. Observe that ( 6 ) reduces to (A2) when M = 0. 
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In this case, suppose there is a ii'-edge with in the range (£^^,£^ 2 ) and = Kj, then the 

amount of X-ray energy, that attenuated at a rate Kj within slot will be the summation of 

i{ej^)Aej^ and where slot Asj^ and Asj^ eorrespond to A/tj. 


Appendix B 

Proofs of Convexity of C,{ol) 


The proofs of eonvexity of for both the lognormal and Poisson noise measurement models share 

a eommon eomponent, whieh we first introduee as a lemma. 

Lemma 7: For l(k) that satisfy Assumption 1, the following holds: 



> 0 


7 Jo 


IIK — 


+ 1)2 


(p + 


l(k)i(p)/7(k + p) d/idfi: 


(Bla) 

(Bib) 


for g > 1 [as in (10b)] and any nonnegative funetion : M —)■ ]R_|_. 

Proof: In Fig. 10, the (/i, n) eoordinates of P, B and N are (kq, 0), (k^q, 0) and (kj+i, 0), respeetively; 
the line OS is defined by k = p. 

Considering the finite support set of l{k), the effeetive integral range is [kq, kj+i]^, whieh is the reetangle 
RMSJ in Fig. 10. Using the symmetry between k and ^ in (Bla), we ehange the integral variables of 
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(Bla) by rotating the coordinates by 90° using 


which yields 


where 




1 

w = - 

2 


V2 kj+i g{fi) 


J J tu(/i, k ) d/t d/r 

V^KQ -gip.) 

V^Kj+i g{p.) 

J J w{fi,R) dR h(^V2ii) dp, 

V2ko 0 


A /- -\ I ^ H \ ( ^^ ^ 

K) — Z[IJ, K)l I -^ I L 


V2 J \ V2 


z{fi, k) = 


9{f^) = 


A ( 1 \ r .2 -2 


/i — K 


q^o + 1 

P-V2ko, P < + i^j+i) 


(Bla) 

(B2b) 


(B3a) 


(B3b) 


(B3c) 

(B3d) 

(B3e) 


y/lKj+i-^i, ix> ^{ko + Kj+i) 

and (B3b) follows because (B3c) is even symmetric with respect to R. Now, the integration region is 
reduced to the triangle RSJ in Fig. 10. 


Note that 2 :(/i, ic) > 0 in the cone between lines OH and 01, [both of which are specified by z{p, R) = 0], 
which implies that w {p, R) > 0 within ROE and CSQ', hence, the integrals of w{p, R)h[y/2p) over ROE 
and CSQ are nonnegative and, consequently. 


where 


w > 


w{iJ, k) dK h{\/2iJt) d/r. 


n 



^ [^0 1 ^jo \) 


11 + K 


.^jo 5 ^j+l] 


(B4) 


(B5) 


is our new integration region, which is the rectangle ECQJ in Fig. 10. 
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Define 

A 

c = --• 

1 + g-?o 


(B6) 


Now, we split the inner integral over R on the right-hand side of (B4) for fixed Jj, into two regions: 
z{fl, R) >0 and z{fl, R) < 0, i.e., trapezoid ECQI and triangle EIJ, and prove that the positive contribution 
of the integral over ECQI is larger than the negative contribution of the integral over the EIJ. 

Note that the line 01 is specified by z{jl, R) = 0 and the (/i, k)- coordinate of I in Fig. 10 is thus 
(kj+i.jq, kj+i). Hence, under Assumption 1, region ECQI C (/Ciow U /Cmid) x /Chigh and region EIJ C 
/Clow X /Chigh- Therefore, the following hold within IZ: 

• When z{fl, R) > 0, i.e., in region ECQI, 


L=i^ > L(cg^V) (B7a) 

y/2 

L_M=i5 > '■{cp) (B7b) 

V2 

where (B7a) follows because k = takes values between and cq^°jl G [kj^, kj+i], i.e., k G /Chigh 
and i{K) decreases in /C^gh; (B7b) follows because /i = takes values between cp, G [kq, kj+i-jq] 
and KjQ, i.e., /i crosses /Ciow increasing) and /Cmid (t-(fi) keeping high) regions, see (15b). Here, 
{cp, cq^°p) is the (p, k)- coordinate of one point on line 01 specified by p in {p, K)-coordinate system. 
• When z{p, R) < 0, i.e., in region EIJ, 


l(«) L-m+ii < L(cg^v) (B7c) 

L(/i) L_1^ < ^{cp) (B7d) 

where (B7c) follows because k = > cq^°p, i.e., k G /Chigh and (B7d) follows because /i = < 

cp, i.e., jjj G /Clow- 


By combining (B7) with (B4), we have 


w > 11 z{fi, k) Pk h{fi) dp. 

'■R 


K.J + 1 + KJ+1_ 


^2 




z^n, k) d/tdp 


'^o+«jo \R\(n.,ii)&n\ 

72 


(B8a) 

(B8b) 
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with 


h{n) = i[cq^° 
> 0 . 


(B8c) 


It easy to verify that /r^, k) dn is an inereasing funetion of /i over the range of the outer 


integral 


{k, I (/^,k)g7^} 

^ J+1 “1“^ J^+1 — jq 




V2 


and, eonsequently, 


J z{fi, k) d/t > 0 (B9) 

{R I {fi,R)&TZ} 

where the equality is attained for p, = (kq + Kjg)/y/2. Finally, (Bl) follows from (B8a) and (B9). ■ 


This proof of eonvexity of Lemma 7 is eonservative as we loose the positve integral in region RCE 
and CS'Q to zero. 


Now, we use Lemma 7 to prove the eonvexity of Ci{ct) in Lemmas 2 and 4. Note that the mass- 
attenuation speetrum l(/t;) is eonsidered known in Lemmas 2 and 4 and define .^(■) = i^(-) and the 
eorresponding first and seeond derivatives: 

^(s) = (-fi;L)^(s), ^{s) = (BIO) 

Observe that notational simplieity, we omit the 

dependenee of on a. and X and use X™' and ,^o(*1’ck) interehangeably. 


Proof of Lemma 2: We use the identities 


<9^0 (*^Q:) 
daf 

dotdcx^ 


diag(^o($Q:))<l> 

^’(0n«)0n0n 


to eompute the gradient and Hessian of the lognormal NLL in (14b): 

= $^diag(t($a)) diag-' (X°“') (inoX^*^' - Ino 5) 
diag-2 (X™') diag (m) $ 


(Blla) 

(Bllb) 


(B12a) 

(B12b) 
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where the iV x 1 vector w = {wn)n=i is defined as 


Wn = + t{s) (lnX™‘ - \n£n) 


(B12c) 

(B12d) 


For s > 0, 


t{s) = / L(K)e / K^v{K)e 


Ki{K)e dn 


> 0 


(B13a) 


where we used the Laplace-transform identity for derivatives (Ic) and the inequality follows by using the 
Cauchy-Schwartz inequality. We can write t{s) as 


t{s) = 2 ' ' (« —/r)^L(fi;)L(/r)e d/i 


(B13b) 


which also implies that t{s) > 0. We obtain (B13b) by using the Laplace-transform identity for derivatives 
(Ic), combining the multiplication of the integrals, and symmetrizing the integral expressions by replacing 
k'^ with -f /i^)/2. 

According to (18a), lnX°“' — ln£^„ > —U, see also (18b). Because of positivity of t{s) for s > 0, 


U!n > - Ut{s) 


flK — U 


s=4>l^a 

(/r - nf 


L(/i)L(/T;)e 


-{H+K)cj)'^a 


dudfi 


-f 1 

qjo - 1 


UK, 


q- 


JO 


-(/i -f k) 


i{fi)i{K)e 




dndfi 


(B14a) 

(B14b) 

(B14c) 


(g^« + 1)2 

where (B14b) is obtained by plugging (B13b) into (B14a), and we have used the definition of U (18b) in 
(B14c). 


Using Lemma 7 with h^n) = e we have Wn> 0 for all n. Therefore, the Hessian matrix (B12b) 

of Xi(q:) is positive semidefinite. ■ 

Proof of Lemma 4: We use the identities (Bll) to compute the gradient and Hessian of the Poisson 
NLL (23): 


= $'^diag(^o($Q:)) [l - diag ^ (X™‘) £] (B15a) 

f)n ( 

dadoL^ = diag“^ (X°“‘) diag (£) diag {x) $ (B15b) 
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where the iV x 1 vector x = {xn)n=i is defined as 

Xn = i^{s) + '({s)i{s) 


'7'OUt 

^rt. 


S=4>ioL 


Since — V)Sn > 0 according to (24a), we have 

IT 


Now, 


£ 

<-'71 


1 > -V. 


Xn> II — Tv) L{jj,)i{K)e dndjj, 

( 

[IK 


2 2 

^ V) L(/i) dn d/i 


> 


+ 1)^ 


g2io + 1 


flK — 


Q- 


JO 


(gio +1)2 


(/i + nf] i{T L(K)e-(^+^)‘^"“ d/t d/r 


> 0 


(B15c) 


(B16) 


(B17a) 

(B17b) 

(B17c) 

(B17d) 


where (B17a) follows by applying inequality (B16) to (B15c), using the Laplace-transform identity for 
derivatives (Ic), and combining the multiplication of the integrals, and (B17b) is due to the symmetry with 
respect to /i and k by replacing T with {T + T)T- Now, plug (24b) into (B17b) and apply Lemma 7 with 
h{K) = to conclude (B17d). Therefore, the Hessian of Xi(q:) in (B15b) is positive semidefinite. 


Appendix C 

Proofs of KL Property and Convergence of PG-BFGS Iteration 

Proof of Theorem 2: According to [39], real-analytic function [51], semialgebraic functions [52] and 
their summations satisfy the KL property automatically. Therefore, the proof consists two parts: 

(a) The NLLs in (13) and (22), are both real-analytic functions of {cx,I) on C C dom(/); 

(b) Both r{ct) in (29b)-(29a) and I[o,+oo)(X) are semialgebraic functions; 

and, consequently, the objective function f{ct,I) satisfies the KL property. 

Real-analytic NLLs. The NLLs in (13) and (22) are in the form of weighted summations of terms 
In ’ and In^ for n = 1,2, ■ ■ ■, N. Weighted summation of real-analytic 
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functions is real-analytic; hence, we need to prove that the following are real-analytic functions: 


li{t) = b^[4>^{cx + t7))(X -f tJ) 

(Cla) 

kit) = \n[b^ [(f)^ {a + t-f)){I + tJ)] = \nli{t) 

(Clb) 

hit) = llit) 

(Clc) 


Since are smooth, it is sufficient to prove that the mth derivatives, are bounded for all 

m, {ct,T), ( 7 , J') and t such that (a + + tj') G dom(/). The mth derivative of li{t) is 

4"^ = ( 0 ^ 7 )™ (a + tj)(X + tJ) + m(0^7)™-i (a + t-i)J (C2) 

which is bounded for any ol, X, 7 , J' and t such that (ck + tj,I + tJ') is in one of compact subsets 
C C dom(/). For any compact set C C dom(/), there exists e > 0 such that li{t) > e, for all {cx + t^,I + 
tj') G C. Hence, In(-) and (■)^ are analytic on [e, -fcx)). Now, since the compositions and products of 
analytic functions are analytic [53, Ch. 1.4], we have that both l 2 {t) and are analytic. 

Therefore, the NLLs in both (13) and (22) are analytic. 

Semialgebraic regularization terms. According to [39], i) the ^2 and norms 11-112 and H-Hi are 
semialgebraic, ii) indicator function I[o,+oo)(-) is semialgebraic, iii) finite sums and products of semialgebraic 
functions are semialgebraic, and iv) the composition of semialgebraic functions are semialgebraic. Therefore, 
II'F^ckIIi, I[o,+oo)(ck) and I[o,+oo)(^) are all semialgebraic. Since ~ = Il-Pi<^ll 2 some 

matrix Pj is semialgebraic, r{cx) in (29a) is semialgebraic. 

Finally, according to [39], sum of real-analytic and semialgebraic functions satisfies KL property. 
Therefore, satisfies the KL property on dom(/). ■ 

Proof of Theorem 3: We apply [39, Lemma 2.6] to establish the convergence of { 

Since (13), r{ct) in (29) and I[o,+oo)(^) are lower-bounded, we only need to prove that (22) is lower- 
bounded. By using the fact that In a; < a; — 1, we have 

C{cx,I) = ^ y ^ (C3a) 

n=l ” 

> 0 (C3b) 

According to the assumption, we have f{ct,I) strongly convex over X and bounded the step size 
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So, there exist eonstants 0 < / < L < +cx) sueh that 




Henee, we have verified all eonditions of [39, Lemma 2.6]. 


(C4a) 

(C4b) 
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